2025-01-23
glmSee Chapters 19-20 in our Course Notes for more on logistic regression and related models.
The US space shuttle Challenger exploded on 1986-01-28. An investigation ensued into the reliability of the shuttle’s propulsion system. The explosion was eventually traced to the failure of one of the three field joints on one of the two solid booster rockets. Each of these six field joints includes two O-rings which can fail.
We have data on 23 space shuttle flights that preceded Challenger on primary O-ring erosion and/or blowby and on the temperature in degrees Fahrenheit. No previous liftoff temperature was under 53 degrees F.
damage = number of damage incidents out of 6 possibleburst = 1 if damage > 0orings1 <- faraway::orings |> tibble() |>
mutate(burst = case_when( damage > 0 ~ 1, TRUE ~ 0))
orings1 |> summary() temp damage burst
Min. :53.00 Min. :0.0000 Min. :0.0000
1st Qu.:67.00 1st Qu.:0.0000 1st Qu.:0.0000
Median :70.00 Median :0.0000 Median :0.0000
Mean :69.57 Mean :0.4783 Mean :0.3043
3rd Qu.:75.00 3rd Qu.:1.0000 3rd Qu.:1.0000
Max. :81.00 Max. :5.0000 Max. :1.0000
burst and tempggplot(orings1, aes(x = factor(burst), y = temp)) +
geom_violin() +
geom_boxplot(aes(fill = factor(burst)), width = 0.3) +
stat_summary(geom = "point", fun = mean, col = "white", size = 2.5) +
guides(fill = "none") +
labs(title = "Are bursts more common at low temperatures?",
subtitle = "23 prior space shuttle launches",
x = "Was there a burst? (1 = yes, 0 = no)",
y = "Launch Temp (F)")burst and tempWe want to treat the binary variable burst as the outcome, and temp as the predictor.
lm()fit1 <- lm(burst ~ temp, data = orings1)
tidy(fit1, conf.int = T) |> gt() |>
fmt_number(decimals = 3) |> tab_options(table.font.size = 20)| term | estimate | std.error | statistic | p.value | conf.low | conf.high |
|---|---|---|---|---|---|---|
| (Intercept) | 2.905 | 0.842 | 3.450 | 0.002 | 1.154 | 4.656 |
| temp | −0.037 | 0.012 | −3.103 | 0.005 | −0.062 | −0.012 |
\[ \operatorname{\widehat{burst}} = 2.905 - 0.037(\operatorname{temp}) \]
ggplot(orings1, aes(x = temp, y = burst)) +
geom_jitter(col = "navy", size = 3, width = 0, height = 0.1) +
geom_smooth(method = "lm", se = F, col = "red",
formula = y ~ x) +
labs(title = "Bursts more common at lower temperatures",
subtitle = "23 prior space shuttle launches",
y = "Was there a burst? (1 = yes, 0 = no)",
x = "Launch Temp (F)")fit1fit1 predict for the probability of a burst if the temperature at launch is 70 degrees F?fit1Let’s use our linear probability model fit1 to predict the probability of a burst at some other temperatures…
# A tibble: 5 × 2
temp .fitted
<dbl> <dbl>
1 80 -0.0857
2 70 0.288
3 60 0.662
4 50 1.04
5 31 1.75
fit1 (1/2)fit1 (2/2)Our outcome takes on two values (zero or one) and we then model the probability of a “one” response given a linear function of predictors.
Idea 1: Use a linear probability model
Idea 2: Build a non-linear regression approach
glm()The function we use in logistic regression is called the logit link.
\[ logit(\pi) = log\left( \frac{\pi}{1 - \pi} \right) = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + ... + \beta_k X_k \]
The inverse of the logit function is called the logistic function. If logit(\(\pi\)) = \(\eta\), then \(\pi = \frac{exp(\eta)}{1 + exp(\eta)}\).
We usually focus on the logit in statistical work, which is the inverse of the logistic function.
We’ll use the glm function in R, specifying a logistic regression model.
fit2 for Prob(burst)fit2 <- glm(burst ~ temp, data = orings1,
family = binomial(link = "logit"))
tidy(fit2, conf.int = TRUE) |> gt() |>
fmt_number(decimals = 3) |> tab_options(table.font.size = 24)| term | estimate | std.error | statistic | p.value | conf.low | conf.high |
|---|---|---|---|---|---|---|
| (Intercept) | 15.043 | 7.379 | 2.039 | 0.041 | 3.331 | 34.342 |
| temp | −0.232 | 0.108 | −2.145 | 0.032 | −0.515 | −0.061 |
\[ \log\left[ \frac { \widehat{P( \operatorname{burst} = \operatorname{1} )} }{ 1 - \widehat{P( \operatorname{burst} = \operatorname{1} )} } \right] = 15.043 - 0.232(\operatorname{temp}) \]
fit2’s predictions\[ Pr(burst) = \frac{0.302}{(0.302+1)} = 0.232. \]
fit2 for temp = 60What is the predicted probability of a burst if the temperature is 60 degrees?
predict(fit2)What is the predicted probability of a burst?
augment do this, as well?Yes, and it will retain many more decimal places in intermediate calculations…
Use the augment function to get the fitted probabilities into the original data, then plot.
temp values with geom_line, so the appearance of the function isn’t as smooth as the actual logistic regression model.fit1 and fit2fit2 coefficients?How can we interpret the coefficients of the model?
\[ logit(burst) = log(odds(burst)) = 15.043 - 0.232 \times temp \]
Suppose Launch A’s temperature was one degree higher than Launch B’s.
temp differs by 1 degree is 0.793fit2tidy(fit2, exponentiate = TRUE, conf.int = TRUE, conf.level = 0.90) |>
filter(term == "temp") |>
gt() |> fmt_number(decimals = 3) |>
tab_options(table.font.size = 24)| term | estimate | std.error | statistic | p.value | conf.low | conf.high |
|---|---|---|---|---|---|---|
| temp | 0.793 | 0.108 | −2.145 | 0.032 | 0.632 | 0.919 |
temp was 1?Linear Probability Model (a linear model)
lm(event ~ predictor1 + predictor2 + ..., data = tibblename)
Logistic Regression Model (generalized linear model)
glm(event ~ pred1 + pred2 + ..., data = tibblename,
family = binomial(link = "logit"))
\[ logit(event) = log\left( \frac{Pr(event)}{1 - Pr(event)} \right) = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + ... + \beta_k X_k \]
\[ odds(event) = \frac{Pr(event)}{1 - Pr(event)} \]
\[ Pr(event) = \frac{odds(event)}{odds(event) + 1} \]
\[ Pr(event) = \frac{exp(logit(event))}{1 + exp(logit(event))} \]
model_parameters() for fit2Parameter | Log-Odds | SE | 90% CI | z | p
--------------------------------------------------------------
(Intercept) | 15.04 | 7.38 | [ 4.95, 30.48] | 2.04 | 0.041
temp | -0.23 | 0.11 | [-0.46, -0.08] | -2.14 | 0.032
Uncertainty intervals (profile-likelihood) and p-values (two-tailed)
computed using a Wald z-distribution approximation.
The model has a log- or logit-link. Consider using `exponentiate =
TRUE` to interpret coefficients as ratios.
model_parameters()fit2 slope (and CI)Sample odds ratio for temp is 0.79, with 90% CI (0.63, 0.92)
fit2 to a null modelfit2 to a model with only an intercept term (no temp information)test = "Rao") or Pearson’s chi-square test (test = "Chisq")Analysis of Deviance Table
Model: binomial, link: logit
Response: burst
Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev Rao Pr(>Chi)
NULL 22 28.267
temp 1 7.952 21 20.315 7.2312 0.007165 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The Receiver Operating Characteristic (ROC) curve is the first approach we’ll discuss today.
| AUC | Interpretation |
|---|---|
| 0.5 | A coin-flip. Model is no better than flipping a coin. |
| 0.6 | Still a fairly weak model. |
| 0.7 | Low end of an “OK” model fit. |
| 0.8 | Pretty good predictive performance. |
| 0.9 | Outstanding predictive performance. |
| 1.0 | Perfect predictive performance. |
Recall that our fit2 has AUC = 0.7857.
fit2 model predicted probability > 0.5 actual
predicted 1 0
1 4 0
0 3 16
Confusion Matrix and Statistics
actual
predicted 1 0
1 4 0
0 3 16
Accuracy : 0.8696
95% CI : (0.6641, 0.9722)
No Information Rate : 0.6957
P-Value [Acc > NIR] : 0.04928
Kappa : 0.6497
Mcnemar's Test P-Value : 0.24821
Sensitivity : 0.5714
Specificity : 1.0000
Pos Pred Value : 1.0000
Neg Pred Value : 0.8421
Prevalence : 0.3043
Detection Rate : 0.1739
Detection Prevalence : 0.1739
Balanced Accuracy : 0.7857
'Positive' Class : 1
glance() (1/2)# A tibble: 1 × 8
null.deviance df.null logLik AIC BIC deviance df.residual nobs
<dbl> <int> <dbl> <dbl> <dbl> <dbl> <int> <int>
1 28.3 22 -10.2 24.3 26.6 20.3 21 23
nobs = we fit fit2 using 23 observationsdf.null) with null.deviance of 28.3fit2 (includes temp) has 21 residual df (df.residual) with deviance of 20.3
glance() (2/2)glance(fit2) |>
gt() |>
fmt_number(columns = c(-df.null, -df.residual, -nobs), decimals = 2) |>
tab_options(table.font.size = 24)| null.deviance | df.null | logLik | AIC | BIC | deviance | df.residual | nobs |
|---|---|---|---|---|---|---|---|
| 28.27 | 22 | −10.16 | 24.32 | 26.59 | 20.32 | 21 | 23 |
fit2 has deviance = -2*log likelihood (logLik)AIC and BIC are for comparing models for the same outcome, as in linear regression (smaller values indicate better fits, as usual.)model_performance(fit2) (1/5)# Indices of model performance
AIC | AICc | BIC | Tjur's R2 | RMSE | Sigma | Log_loss | Score_log
---------------------------------------------------------------------------
24.315 | 24.915 | 26.586 | 0.338 | 0.372 | 1.000 | 0.442 | -2.957
AIC | Score_spherical | PCP
--------------------------------
24.315 | 0.149 | 0.720
AIC and BIC are Akaike and Bayes information criteriaAICc is a corrected AIC (correction for small sample size)Sigma is the estimated residual standard deviationRMSE estimates the root mean squared errormodel_performance(fit2) (2/5)\[ \mbox{Tjur's R2} = 0.338 \mbox{ for fit2} \]
Tjur's R2 is Tjur’s coefficient of determination. Higher values indicate better fit.
model_performance(fit2) (3/5)\[ \mbox{Log_loss} = 0.442 \mbox{ for fit2} \]
Log_loss quantifies prediction quality. If \(y_i\) is the actual/true value (1 or 0), \(p_i\) is the predicted probability, and \(ln\) is the natural logarithm, then:\[ \mbox{Log_loss}_i = - [y_i ln(p_i) + (1 - y_i) ln (1-p_i)] \]
Log_loss = sum of individual Log_loss values.Log_loss values indicate better predictions.model_performance() (4/5)Score_log | Score_spherical
----------------------------
-2.957 | 0.149
Score_log and Score_spherical are two other scoring rules for predictive performance in a logistic regression.Score_log takes values from [-\(\infty\), 0] with values closer to 0 indicating a more accurate model.Score_spherical takes values from [0, 1] with values closer to 1 indicating a more accurate model.model_performance() (5/5)\[ \mbox{PCP} = 0.720 \mbox{ for fit2} \]
PCP is called the percentage of correct predictionsPCP = sum of predicted probabilities where y=1, plus the sum of 1 - predicted probabilities where y=0, divided by the number of observations
PCP ranges from 0 (worst) to 1 (best).fit2 modelfit2 model432 Class 04 | 2025-01-23 | https://thomaselove.github.io/432-2025/